Machine learning models for detecting the causes of conditions of a satellite communication system

ABSTRACT

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training and using machine learning models to detect problems in a satellite communication system. In some implementations, one or more feature vectors that respectively correspond to different times are obtained. The feature vector(s) are provided as input to one or more machine learning models trained to receive at least one feature vector that includes feature values representing properties of the satellite communication system and output an indication of potential causes of a condition of the satellite communication system based on the properties of the satellite communication system. A particular cause that is indicated as being a most likely cause of the condition of the satellite communication system is determined based on one or more machine learning model outputs received from each of the one or more machine learning models.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.16/118,836, filed Aug. 31, 2018, now allowed, which is incorporated byreference in its entirety.

BACKGROUND

Satellite communication systems are complex systems that includemultiple subsystems which, in turn, are made of multiple software andhardware components. The subsystems and components can also havemultiple instances of software processes that implement the subsystemsor components. For example, there may be multiple instances of InternetProtocol (IP) traffic handling subsystems to handle network traffic fora single satellite communication channel. Each subsystem, component, andsoftware instance can have many pieces of status and statisticalinformation that help define the state of the subsystem, component, ofsoftware instance.

SUMMARY

In some implementations, a communication system, e.g., a satellitecommunication system, can train and use machine learning models topredict causes of a condition of the satellite communication system. Forexample, a computer system can train machine learning models using dataspecifying properties of a satellite communication system at one or morepoints in time and, for each point in time, a label that indicates acause (e.g., a primary cause or an initial or triggering cause) of acondition (e.g., a problem, degradation, or potential problem) of thesatellite communication system at that time. The properties can includethe status, statistics, metrics, and/or other appropriate data for eachof multiple subsystems and components of the satellite communicationsystem. The trained machine learning models can output machine learningoutputs that indicate one or more potential causes of the condition ofthe satellite communication system based on properties of the satellitecommunication system.

The machine learning outputs can also indicate, for each potentialcause, a probability that the potential cause is the actual cause of thecondition of the satellite communication system. The computer system canuse the probabilities output by multiple machine learning models for agiven point in time or over a given time period to determine a mostlikely cause (e.g., a primary cause) of the condition of the satellitecommunication system. For example, the system can identify, as the mostlikely cause, a cause having the highest probability across theprobabilities output by the machine learning models. In another example,the system can identify, as the most likely cause, a cause having aprobability that has increased over time and that exceeds a thresholdprobability. In yet another example, the system can identify, as themost likely cause, a cause having a probability that has increased toexceed the probabilities of other potential causes.

The computer system can also select an action that alters the operationof the satellite communication system based on the identified mostlikely cause and/or the properties of the satellite communication systemthat were provided as input to the machine learning models. For example,if a slow storage device is slowing other components, preventing theother components from operating, and/or degrading the performance of thesatellite communication system, the computer system can determine thatthe slow storage device is the most likely cause of the degradedperformance (e.g., rather than the components affected by the slowstorage device). The computer system can select switching to a differentstorage device as an action to take in response to the detected systemconditions. The computer system can then provide, to a device, anindication of the most likely cause of the condition of the satellitecommunication system and/or the selected action. The device can present(e.g., using display or a spoken language interface) the most likelycause and/or the selected action to an operator. The operator can thencause a network management system (or the actual component or subsystem)to perform the selected action. In some implementations, the computersystem can cause the selected action to be performed automatically,e.g., without input from the operator.

As satellite communication systems are complex and include manysubsystems and components, determining a cause of a condition of thesatellite system can be difficult and time consuming. For example,determining a primary cause of a current problem can involve triaging atmultiple levels of the operations. For problems that do not get resolvedusing a well-known procedure of evaluating certain data and restartingcertain subsystems, the triage escalates to higher tiers of the supporthierarchy and by the time someone is able to perform a deep dive intothe problem, a significant amount of time can pass and the problem canget worse and cause further degradation to the performance of thesatellite communication system.

Using the machine learning techniques described herein, a computersystem can determine the most likely cause of a condition of a satellitecommunication system that would otherwise not be detected. The machinelearning models can detect and provide information indicating causesthat are based on information (e.g., status and statistics information)for multiple subsystems or combinations of components that would not beevaluated by a human operator or expert.

The computer system can also adapt the machine learning models tochanges in the satellite communication system, for example, byretraining the models using newly detected causes and their associatedproperties of the satellite communication system. This is advantageousover a rules-based system that a human operator or expert would have toadjust over time based on changes to the satellite communication systemor changes in the performance of the satellite communication system. Forexample, a speed-based threshold for determining that a component isslower than normal may have to be adjusted each time the satellitecommunication system is altered such that the component operates at ahigher speed. The machine learning models can be updated (e.g.,retrained) to account for such changes over time. For example, themachine learning models can be retrained using updated data regardingthe properties of the satellite communication system and the cause ofthe condition of the satellite communication system corresponding tothose properties.

The machine learning models can also be used to determine the causes ofconditions of other satellite communication networks, e.g., satellitecommunication systems that are similar to the satellite communicationsystem for which the machine learning models are trained. This allowsfor the detection of causes of conditions of satellite communicationsystems for which a sufficient amount of data is not available, such asnewly deployed satellite communication systems.

In one general aspect, the techniques disclosed herein describe methodsof training and using machine learning models to determine a cause of acondition of a satellite communication system. For example, a methodperformed by one or more computers can include: obtaining, by the one ormore computers, one or more feature vectors that respectively correspondto different times, each feature vector including feature values thatrepresent properties of a satellite communication system at the timecorresponding to the feature vector; providing, by the one or morecomputers, the one or more feature vectors as input to one or moremachine learning models, each of the one or more machine learning modelsbeing trained to receive at least one feature vector that includesfeature values representing properties of the satellite communicationsystem and output an indication of potential causes of a condition ofthe satellite communication system based on the properties of thesatellite communication system; receiving, by the one or more computersand from each of the one or more machine learning models, one or moremachine learning model outputs that indicate one or more potentialcauses of a condition of the satellite communication system based on theproperties of the satellite communication system represented by the oneor more feature vectors; determining, by the one or more computers andbased on the one or more machine learning model outputs received fromeach of the one or more machine learning models, a particular causeindicated as being a most likely cause of the condition of the satellitecommunication system; and providing, to a device, an indication of theparticular cause of the condition of the satellite communication system.

Implementations can include one or more of the following features. Forexample, some implementations include selecting, by the one or morecomputers, an action to alter network operation of the satellitecommunication system based at least on the particular cause and causingthe selected action to be performed.

Some implementations include training the one or more machine learningmodels using labeled training data for a particular satellitecommunication system. The labeled training data can include, for each ofmultiple times, properties of the particular satellite communicationsystem at the time and labels that indicate one or more causes of acondition of the particular satellite communication system at the time.The one or more labels can be assigned to the properties of theparticular satellite communication system by a network operator.

In some implementations, the one or more machine learning models includemultiple machine learning models. Each machine learning model can betrained using different training parameters than each other machinelearning model. The different training parameters can include at leastone of (i) different types of machine learning models, (ii) differentsubsets of the labeled training data, or (iii) different configurationsof a same type of machine learning model.

In some implementations, the one or more machine learning models includemultiple machine learning models. The one or more machine learning modeloutputs can include, for each potential cause of the condition of thesatellite communication system, a probability that the potential causeis an actual cause of the condition of the satellite communicationsystem. Determining the particular cause indicated as being a mostlikely cause of the condition of the satellite communication system caninclude determining the particular cause based on one or more combinedscores generated by determining, for each of the one or more potentialcauses, a combination of the probabilities output by the machinelearning models for the potential cause.

In some implementations, the one or more machine learning models aretrained to output an indication that the condition of the satellitecommunication system is normal based on the one or more feature vectorswhen the one or more machine learning models detect that the conditionof the satellite communication is normal based on the properties of thesatellite communication system.

In some implementations, the one or more feature vectors includemultiple feature vectors for a particular time period. Each featurevector can include feature values that represent properties of thesatellite communication system at a different time within the timeperiod than each other feature vector. The one or more machine learningmodels can be trained to output an indication of potential causes of thecondition of the satellite communication system based on the propertiesof the satellite communication system during the time period representedby the multiple feature vectors.

Some implementations include updating the one or more machine learningmodels. The updating can include receiving additional training data thatincludes a set of additional feature vectors and labels for theadditional feature vectors, including a label, for each additionalfeature vector, that specifies a cause of a condition that of thesatellite communication system at the time corresponding to theadditional feature vector. Each additional feature vector can includefeature values that represent actual properties of the satellitecommunication system detected at a time corresponding to the additionalfeature vector. The updating can also include training the one or moremachine learning models using the additional training data.

Some implementations include using the one or more machine learningmodels to determine most likely causes of conditions of a secondsatellite communication system different from the satellitecommunication system based on properties of the second satellitecommunication system.

Some implementations include selecting, by the one or more computers, anaction to alter network operation of the satellite communication systembased at least on the machine learning model outputs and the particularcause. The selecting can include accessing a set of rules that specify,for each potential cause, one or more corresponding actions for alteringthe network operation of the satellite communication system andselecting, as the action to alter the network operation of the satellitecommunication system, at least one of the one or more actions thatcorrespond to the particular cause.

Some implementations include selecting, by the one or more computers, anaction to alter network operation of the satellite communication systembased at least on the particular cause. The selecting can includeproviding data indicating the particular cause and the one or morefeature vectors as input to one or more second machine learning modelstrained to receive a cause of a condition of the satellite communicationsystem and at least one feature vector that includes feature valuesrepresenting properties of the satellite communication system andoutputs an indication of one or more actions to alter the networkoperation of the satellite communication system based on the cause andthe at least one feature vector. The selecting can also includereceiving, from each of the one or more second machine learning models,one or more second machine learning outputs that indicate one or moreactions to alter the network operation of the satellite communicationsystem based on the particular cause and the one or more featurevectors. The action to alter the network operation of the satellitecommunication system can be selected based at least on the one or moresecond machine learning outputs.

In some implementations, selecting the action to alter the networkoperation of the satellite communication system can include selectingthe action based on data specifying results of each of the one or moreactions indicated by the one or more second machine learning outputswhen each of the one or more actions were previously performed inresponse to a previous instance of satellite communication systemconditions associated with the particular cause.

In some implementations, the one or more second machine learning modelsare trained to output the indication of one or more actions to alter thenetwork operation of the satellite communication system based on resultsof previous actions performed in response to previous conditions of thesatellite communication system and associated causes of the previousconditions.

In some implementations, determining, by the one or more computers andbased on the one or more machine learning model outputs received fromeach of the one or more machine learning models, a particular causeindicated as being a most likely cause of the condition of the satellitecommunication system can include identifying, for each of the one ormore potential causes and based on the one or more machine learningoutputs received from each of the one or more machine learning modelsfor feature vectors that represent properties of the satellitecommunication system over a particular time period, a sequence ofprobabilities that the potential cause is an actual cause of thecondition of the satellite communication system over the particular timeperiod. The determining can also include selecting the particular causebased at on the sequence of probabilities for the particular cause andthe sequence of probabilities for each other potential cause.

In some implementations, selecting the particular cause based at on thesequence of probabilities for the particular cause and the sequence ofprobabilities for each other potential cause can include selecting theparticular cause in response to detecting an increase in theprobabilities for the particular cause during the particular timeperiod.

Some implementations include selecting, by the one or more computers, anaction to alter network operation of the satellite communication systembased at least on the particular cause and determining to perform theselected action automatically based on at least one of (i) a duration oftime between providing the indication of the particular cause of thecondition of the satellite communication system and receiving anoperator command to perform the selected action exceeding a thresholdduration, (ii) a category of the particular cause, (iii) a severity ofthe particular cause, or (iv) a severity of the condition of thesatellite communication system. Some implementations can also includecausing the selected action to be performed.

Some implementations include generating each of the one or more featurevectors. The generating for each particular feature vector can includeidentifying, for a component of the satellite communication system,properties of multiple sub-components of the component at the timecorresponding to the particular feature vector; determining a propertythat represents the multiple sub-components based on the properties ofthe multiple sub-components; and including, in the particular featurevector and as a property of the component, the determined property.

Some implementations include generating each of the one or more featurevectors. The generating for each particular feature vector can includeidentifying, for a particular satellite beam, multiple components of asame type; identifying multiple properties of each of the multiplecomponents; determining, for each property of the multiple properties,an aggregated value that represents an aggregation of the propertyacross each of the multiple components; and including, in the particularfeature vector and as a property of the satellite beam, the determinedaggregated value.

Other embodiments include corresponding systems, apparatus, and softwareprograms, configured to perform the actions of the methods, encoded oncomputer storage devices. For example, some embodiments include asatellite terminal and/or a satellite gateway configured to perform theactions of the methods. A device or system of devices can be soconfigured by virtue of software, firmware, hardware, or a combinationof them installed so that in operation cause the system to perform theactions. One or more software programs can be so configured by virtue ofhaving instructions that, when executed by data processing apparatus,cause the apparatus to perform the actions.

The details of one or more embodiments of the subject matter describedin this specification are set forth in the accompanying drawings and thedescription below. Other features, aspects, and advantages of thesubject matter will become apparent from the description, the drawings,and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram that illustrates an example of a system for usingmachine learning models to detect causes of conditions of a satellitecommunication system.

FIG. 2 is a diagram that illustrates an example of a system for usingmachine learning models to detect causes of conditions of a satellitecommunication system.

FIG. 3 is a flow diagram that illustrates an example process forgenerating a feature vector for network health.

FIG. 4 is a flow diagram that illustrates an example process fortraining machine learning models to detect causes of conditions of asatellite communication system.

FIG. 5 is a flow diagram that illustrates an example process for usingmachine learning models to detect causes of conditions of a satellitecommunication system and performing a selected action to alter operationof the satellite communication system.

Like reference numbers and designations in the various drawings indicatelike elements.

DETAILED DESCRIPTION

FIG. 1 is a diagram that illustrates an example of a system 100 forusing machine learning models to detect causes of conditions of asatellite communication system. The system 100 includes satellitegateways 110 a and 110 b that communicate with a satellite 120, which inturn communicates with satellite terminals 130 a and 130 b. The system110 also includes a computer system 140 that obtains information aboutthe satellite communication system, for example, by communication withthe satellite gateways 110 a and 110 b over a communication network 150.The elements shown can be part of a larger satellite communicationnetwork that includes several satellites, several satellite gateways,satellite terminals, and other elements not illustrated.

The satellite gateways 110 a and 110 b, the satellite 120, and thesatellite terminals 130 a and 130 b can include subsystems that includemultiple software and hardware components. In addition, the subsystemsand their components can also have multiple instances of softwareprocesses that implement the subsystems or components. For example, asatellite gateway can include one or more IP traffic handlingcomponents, satellite forward channel handling component(s), andsatellite return channel handling component(s), just to name a few ofthe components. The satellite gateway can also include multipleinstances of the IP traffic component to handle traffic of each channelof each satellite beam.

The example of FIG. 1 illustrates how the computer system 140 can trainand use machine learning models to evaluate the satellite communicationsystem and detect one or more causes of a condition of the satellitecommunication system. The computer system 140 can also select an actionto alter the operation of the satellite communication system based atleast on the outputs of the machine learning models and cause the actionto be performed. For example, the computer system 140 can train and usethe machine learning models to detect a primary cause of a problem withthe satellite communication system and select an action that willcorrect or prevent the problem from escalating. Various steps of theprocess are illustrated as stages labelled (A) through (J) whichillustrate a flow of data.

In stage (A), the computer system 140 obtains network componentinformation 141 that includes information about (e.g., properties of)the subsystems and components of the satellite communication system. Thenetwork component information 141 can include information about thevarious subsystems and components of the satellites, gateways,terminals, and other elements that make up the satellite network. Thecomputer system 140 can obtain the network component information 141from at least some of the elements. For example, the computer system 140can obtain the network component information 141 from one or more of thesatellite gateways 110 a and 110 b, which can obtain information fromthe satellite 120 and the satellite terminals 130 a and 130 b. Inanother example, the computer system 140 can obtain the networkcomponent information from a hub that obtains the information from oneor more satellite gateways 110 a and 110 b.

The network component information 141 can include various informationfor each of the subsystems, components, and their respective softwareinstances. The information for a component can include status data(e.g., active, inactive, error state, etc.), metrics and statistics(e.g., current data transmission speeds, peak data transmission speeds,the number of items in a queue, number of dropped packets, and so on),error and alarm data indicating whether any or particular errors oralarms are present and rates of errors and alarms, and/or otherappropriate information about the status or operation of the component.The type of information and the amount of information can vary based onthe component or type of component. For example, the information for anIP traffic handling subsystem can be different from the information fora data storage component, e.g., for a network-attached storage (NAS)device).

The computer system 140 can obtain the network component information 141periodically based on a specified time period, e.g., one minute, fiveminutes, one hour, or another appropriate time period. For each point intime, the network component information 141 represents the overall stateor health of the satellite communication system at that point in time.

In stage (B), the network component information 141 for each point intime (or for each time period) is assigned (or otherwise associatedwith) one or more labels. For example, a user 143 (e.g., a networkoperator or network expert) can label the network component information141 for each point in time (or time period) with a label that indicatesa cause of a condition of the satellite communication system at thatpoint in time (or during that time period). If the same cause orcondition is being experienced over a time period, the network componentinformation 141 for each point in time can be assigned the same label.

In some implementations, the computer system 140 can provide aninterface that includes controls that enable the user 143 to select,from a set of pre-specified labels, a label that represents the cause ofthe condition of the satellite communication system at that time. Forexample, the interface can present, as selectable controls, a set oflabels that includes a normal label that indicates that the condition ofthe satellite communication system is normal and that there is no causeof any problems or other conditions in the satellite communicationsystem. The set of labels can also include a label for each of a set ofcauses of conditions of (e.g., problems, potential problems, or issueswith) the satellite communication network. For example, these labels caninclude labels for causes that have been detected by users in the past,labels for causes that are of interest to the users, labels for causesof conditions of other satellite communication systems that could occurin the satellite communication system, and/or other appropriate causesof conditions of satellite communication systems.

Some example labels include “slow access to NAS” when access to an NASdevice is slow (e.g., less than a threshold speed), “ISP routing issue”when there is a problem or other issue with routing Internet data to orfrom an Internet Service Provider (ISP), and “uplink modulator issue”when there is a problem or other issue with the uplink modulator. Thelabels can describe conditions of components that can cause conditionsof the overall satellite communication system. For example, when the NASdevice is slow, this can prevent other components from accessingnecessary data, resulting in queues overflowing, network traffic slowingdown, and overall performance of at least one channel or beam beingdegraded. Thus, the labels can represent a primary cause (e.g., a rootcause) of a condition (e.g., performance degradation, slow traffic, andso on) of the satellite communication system.

The computer system 140 can provide an interface that enables the user143 to assign a label to the network component data 141 for a particularpoint in time based on an investigation into the cause of the conditionof the satellite communication network. For example, after determiningthat a portion of the satellite communication system is experiencing aproblem, the user 143 can review data logs, check the status ofcomponents, and/or perform other procedures to identify the cause of theproblem. Once found, the user 143 can select, from the interface, alabel that represents the cause, e.g., using an interface of thecomputing system 140. The computer system 140 can assign the selectedlabel to the network component data 141 for the point in time or timeperiod in which the condition occurred and the labelled cause was thecause of the condition. For example, the user 143 can select a label andselect the time period during which the condition occurred. The computersystem 140 can generate labelled network component information 144 byidentifying network component information 141 for each point in timeduring the time period and assigning the selected label to the networkcomponent information 141 for each point in time during the time period.The computer system 140 can store the assigned label with respect to aparticular data set that includes the network component information 141that represents the properties of the satellite communication system ata particular point in time.

In some implementations, the computer system 140 can also include, inthe labelled network component information 144, context data thatindicates the condition of the satellite communication system and anyactions performed to alter the operation of the satellite communicationsystem. For example, the computer system 140 can store records thatassociate network conditions and causes with related actions performedin response (e.g., such as configuration changes to correct a problem),as well as effects that are subsequently observed or are attributed tothe actions. The user 143 can perform or cause the components of thesatellite communication system to perform one or more actions to correcta problem with the satellite communication system. The user 143 canprovide, to the computer system 140, data specifying the actions. Insome implementations, the computer system 140 can monitor the actionsand effects of the actions automatically. For example, if the user 143initiates the actions from an interface provided by the computer system140 or a network management system in communication with the computersystem 140, the computer system 140 can associate those actions andresulting effects with an ongoing network condition without the userspecifying that the action attempts to resolve the condition. Thelabelled network component information 140 can also include, for eachaction, data specifying whether the action was successful, e.g., asindicted by the user 143 or as detected by the computer system 140 orthe network management system. In the illustrated example, the labellednetwork component information 144 indicates some network properties(e.g., the IP traffic handling subsystem has an wide area network (WAN)overflow count of 13,000 and the forward channel subsystem has a datatransmission speed of 3.1 Mbps), a condition of slow network traffic, acause of a slow NAS device, and an action of switching the NAS device toa different NAS device.

In stage (C), a machine learning training module 145 of the computersystem 140 uses the labelled network component information 144 formultiple points in time as training examples for training one or moremachine learning models. In particular, the machine learning module 145can train machine learning models using the labelled network componentinformation 141 that has been collected and labelled over a given timeperiod. In some implementations, the training examples used to train themachine learning models can include labelled network componentinformation obtained from other satellite communication networks, e.g.,in addition to the labelled network component information 141. Thetraining examples used to trained the machine learning models caninclude a subset of the labelled network component information 144,e.g., selected by a user.

Each machine learning model can be any of various types, such as neuralnetwork, a maximum entropy classifier, a decision tree, an XG boosttree, a random forest classifier, a support vector machine, a logisticregression model, K-nearest neighbors, and so on. The training processalters the parameters of the machine learning model so that the modellearns internal function(s) or mapping(s) between an input set ofproperties of a satellite communication system and potential causes ofconditions of the satellite communication system (and respectiveprobabilities for the potential causes). The properties of the satellitecommunication system can include information about the varioussubsystems and components of the satellites, gateways, terminals, andother elements that make up the satellite network, e.g., similar to orthe same as the network component information 141. The probability for apotential cause represents a likelihood or confidence that the potentialcause is the actual cause of the condition.

In some implementations, the labels used to train the machine learningmodel(s) includes only the causes of the conditions included in thelabelled network component information 144. In some implementations, thelabels used to train the machine learning model(s) also include theactions and whether the actions were successful as resolving theconditions.

Each machine learning model can be configured to receive properties ofthe satellite communication system (e.g., in the form of a featurevector) as input and output machine learning outputs that indicate oneor more potential causes of the condition of the satellite communicationsystem (e.g., one or more most likely causes). The machine learningoutputs can also include, for each potential cause, a probability orconfidence that the potential cause is the actual cause of thecondition. In some implementations, each machine learning model istrained to receive multiple sets of properties of the satellitecommunication system (e.g., multiple feature vectors) obtained over atime period and output one or more causes of the condition of thesatellite communication system during the time period and theirrespective probabilities. For example, the inputs can include multiplefeature vectors and each feature vector can include feature values thatrepresent the properties of the satellite communication system at aparticular point in time during the time period. Each feature vector canbe for a different point in time than each other feature vector providedas input. For example, the feature vectors can represent periodic statesof the satellite communication system, e.g., a feature vector for eachminute during the time period.

In some implementations, multiple machine learning models are trainedand used to detect causes of conditions of the satellite communicationsystem. In this example, each machine learning model can be differentand the machine learning outputs of the machine learning models can becombined to determine a most likely cause of the condition of thesatellite communication system, as described below. The machine learningmodels can be trained differently and/or be of different types (e.g.,one or more neural networks and one or more random forest classifiers).The machine learning models can be trained differently by usingdifferent parameters (e.g., different tuning parameters, optimizers, orlayers) and/or using different subsets of the labelled network componentinformation. For example, the machine learning models 145 can includeneural networks that have different numbers of layers and/or that havebeen trained using different subsets of the labelled data.

In some implementations, a respective machine learning model can betrained for each cause of conditions of the satellite communicationsystem. In this example, the machine learning model for a particularcause can be trained using the feature vectors and, for each featurevector, a label indicating whether the particular cause is the actualcause of the condition. The machine learning model for the particularcause can be trained to output a probability that the particular causeis the actual cause of a condition of the satellite communication systembased on properties of the satellite communication system.

In some implementations, the machine learning training module 145reduces the number of dimensions of each feature vector prior to usingthe feature vector to train the machine learning model(s). For example,the machine learning training module 145 can reduce the number ofdimensions of each feature vector using a feedforward neural network,principal component analysis (PCA), another appropriate dimensionalityreduction technique, and/or a combination of dimensionality reductiontechniques. An example process for training a machine learning model todetect causes of conditions of a satellite communication system isillustrated in FIG. 4 and described below. After training the machinelearning models, the machine learning training module 145 can providemachine learning model data 146 that includes the machine learningmodels to a machine learning module 155 that uses the models todetermine a cause of a condition of the satellite communication system.

In stage (D), the computer system 140 obtains satellite systeminformation 151 that indicates properties of the satellite communicationnetwork. As described above, the properties of the satellitecommunication system can include information about the varioussubsystems and components of the satellites, gateways, terminals, andother elements that make up the satellite network, e.g., similar to orthe same as the network component information 141. The computer system140 can obtain the satellite system information 151 from one or moreelements of the satellite communication system, e.g., from one or morehubs (e.g., one or more gateways) of the satellite communication system.The computer system 140 can obtain the satellite system information 151periodically based on a specified time period. Each set of propertiesfor the satellite communication system can correspond to a particularpoint in time.

In this example, the satellite system information 151 includes dataabout an IP subsystem, a forward channel subsystem, a return channelsubsystem, an infrastructure subsystem, and other components for whichinformation is not presented in FIG. 1. The satellite system information151 can include various data for each component. For example, thesatellite system information 151 can include, for the IP traffichandling subsystem, a WAN queue overflow count that indicates a quantityof items added to the queue that exceeds the size of the queue and anacceleration backbone down count that indicates a number of times theacceleration backbone has went down over a time period. Similarly, thesatellite system information 151 includes a data transmission speed forthe forward channel subsystem, a data transmission speed for the returnchannel subsystem, and data indicating that a router traffic alarm ofthe infrastructure subsystem is present and that an NAS health alarm ofthe infrastructure subsystem is not present. Of course, the satellitesystem information 151 can include other data about the IP traffichandling subsystem, the forward channel subsystem, the return channelsubsystem, and the infrastructure subsystem.

In stage (E), the data processing module 153 of the computer system 140receives the satellite system information 151 and prepares theinformation 151 for input to the machine learning model(s). As theproperties of the satellite communication system (e.g., the informationabout the components of the satellite communication system) includesdifferent types of information (e.g., status, alarms, numerical data,and so on), the data processing module 153 can convert the informationto an appropriate (e.g., common) format. For example, the dataprocessing module 153 can convert any non-numerical data to numericaldata that represents the non-numerical data. The data processing module153 can convert each type of non-numerical data to numerical data usinga conversion function. For example, information specifying whether analarm is present can be converted to a zero if the alarm is not presentor to a one if the alarm is present.

In some implementations, the data processing module 153 aggregates aportion of the satellite system information. The data processing module153 can aggregate information based on component type, location in thesatellite communication system, and/or the type of information. Forexample, if there are multiple instances of a same type of component(e.g., multiple instances of an IP traffic handling subsystem) for asame gateway or same beam, the data processing module 153 can aggregate(e.g., by averaging, convex summation, or another appropriateaggregation technique) the various data about the IP traffic handlingsubsystem across each instance. For each piece of information for the IPtraffic handling subsystem, the data processing module 153 can aggregatethat piece of information for the multiple instances. For example, if apiece of information is a data transmission speed, the data processingmodule 153 can determine the average data transmission speed for theinstances of the IP traffic handling subsystem of the gateway or beam.This aggregated value can be a feature value for a feature of the beamor gateway.

The data processing module 153 can also normalize the satellite systeminformation 151. For example, the data processing module 153 cannormalize each piece of information such that the value of each piece ofinformation has a value within a particular value range, e.g., betweenzero and one inclusive. Example techniques for converting, aggregating,and normalizing information are described below with reference to FIG.3.

In stage (F), a machine learning module 155 obtains the processedsatellite system information 154 and uses the trained machine learningmodel(s) and the processed satellite system information 154 to determineone or more potential causes of a condition of the satellitecommunication system. In some implementations, the machine learningmodule 155 generates a feature vector based on the processed satellitesystem information 154. The feature vector can include feature valuesthat represent the properties of the satellite communication system. Forexample, the feature vector can include a feature value for each pieceof information for each component (and/or for each aggregated value)included in the processed satellite system information 154. This featurevector is referred to herein as a feature vector for network health(FVNH) as the feature values included in the feature vector representsthe overall status or health of the satellite communication system.

In some implementations, the machine learning module 155 pre-processesthe FVNH prior to providing the FVNH as input to the machine learningmodel(s). The pre-processing can include reducing the dimensionality ofthe FVNH, e.g., using a feedforward neural network, principal componentanalysis (PCA), and/or another appropriate dimensionality reductiontechnique. By reducing the dimensionality of the FVNHs, the speed atwhich the machine learning models determine potential causes of acondition of the satellite communication system can be increased and theaccuracy of the machine learning models can be increased by preventingoverfitting.

The machine learning module 155 can provide the pre-processed FVNH (andoptionally one or more other pre-processed FVNHs for the same timeperiod) as input to each machine learning model. Each machine learningmodel can output machine learning outputs 156 based on the FVNH(s). Themachine learning outputs 156 can indicate one or more potential causesof a condition of the satellite communication system and, for eachpotential cause, a probability that the potential cause is the actualcause of the condition. For example, each machine learning model canoutput a vector of probabilities that includes a probability for eachpotential cause in a set of potential causes. The set of potentialcauses can include each of the pre-specified labels that were used tolabel the training data used to train the machine learning model(s). Theset of potential causes can also include a “normal” cause that indicatesthat the satellite communication is operating normally and does not havea cause of a problem. In this example, the probability of cause A is0.0%, the probability of cause B is 0.1%, the probability of cause C(Slow NAS) is 0.4% and the probability of cause Z is 0.0%.

In some implementations, each machine learning model outputs one or moremost likely causes based on the input FVNH(s). Each machine learningmodel can also output, for each of the one or more most likely causes, arespective probability that the most likely cause is the actual cause.In implementations in which the machine learning model(s) are trainedusing labels that indicate actions to alter the operation of thesatellite communication system, each machine learning model can alsooutput one or more actions based on the properties of the satellitecommunication system and/or the most likely cause(s) of the condition ofthe satellite communication system. Each machine learning model can alsooutput, for each action, a probability that the action will alter theoperation of the satellite communication system (e.g., a probabilitythat the action will correct a problem in the satellite communicationsystem).

In stage (G), an analysis and recommendation module 157 receives themachine learning outputs 156, determines a most likely cause of thenetwork condition, and can select an action to alter operation of thesatellite communication network based on the most likely cause. Todetermine the most likely cause, the analysis and recommendation module157 can evaluate the probability of each potential cause. In thisexample, the analysis and recommendation module 157 can select, as themost likely cause, the cause having the highest probability. If multiplemachine learning models are used, the analysis and recommendation module157 can determine a combined score (e.g., a combined probability) foreach potential cause based on the probability of that potential causeoutput by each machine learning model. For example, the combined scorefor a potential cause can be the average of the probabilities for thepotential cause output by the machine learning models.

In some implementations, the analysis and recommendation module 157 candetermine the most likely cause based on the probabilities for eachpotential cause over a time period. For example, the probabilities of apotential cause may change over time based on changes in the propertiesof the satellite communication network. In a particular example, if thestatistics or metrics for components affected by a slow NAS device getworse over time, the probability of the cause of a degraded satellitesystem caused by a slow NAS can increase over time. If the probabilityof a particular cause remains the highest probability amongst thevarious potential causes for at least a threshold duration of time, theanalysis and recommendation module 157 can determine that the particularcause is the most likely cause. In another example, if the probabilityof the particular cause increases at least a threshold amount over aperiod of time or increases to become the highest probability amongstthe potential causes, the analysis and recommendation module 157 candetermine that the particular cause is the most likely cause. In theillustrated example, the probability of the slow NAS has the highestprobability and may be selected as the most likely cause.

The analysis and recommendation module 157 can use a set of rules and/orone or more machine learning models to select an action to alter theoperation of the satellite communication system based at least on thedetermined most likely cause. The set of rules can specify an actionbased on one or more of the causes having the highest probabilities. Forexample, a rule may specify that, if the most likely cause is a slow NASdevice, the action is to failover the NAS device to a backup NAS device.Another example rule may specify that, if the most likely cause is aslow NAS device and the next most likely cause is a communication modulethat communicates with the NAS device, the action is to reconfigure thecommunication module. The set of rules can be generated and maintainedby a network operator, network expert, or another user.

The machine learning models for selecting the action can be trained toselect an action based on the most likely cause and/or the FVNH(s) usedto determine the potential causes and the probabilities of the potentialcauses. The machine learning models can be trained using labelledfeature vectors that are labelled with actions that were performed toalter the operation of the satellite communication system (e.g., thatcorrected a problem with the satellite communication system). In someimplementations, the feature vectors used to train the machine learningmodels include a vector of probabilities for the potential causes. Insome implementations, the feature vectors include the same or similardata as the feature vectors used to train the machine learning modelsused by the machine learning module 153 to determine the probabilitiesof the potential causes.

The feature vectors can also be labelled with a level of effectivenessof the action. For example, if multiple actions were attempted tocorrect a problem, the label for the feature vectors that represent theproperties of the satellite communication system while the problem wasoccurring can include each attempted action and a level of effectivenessof the action. In this way, the machine learning models can be trainedto output an action based on how effective that action is predicted tobe at altering the operation of the satellite communication system.

The analysis and recommendation module 157 can provide the appropriatefeature vector(s) as input to the machine learning model(s) and receivemachine learning outputs that indicate one or more actions. If machinelearning model(s) were trained using probabilities of potential causes,the analysis and recommendation module 157 can provide, as the input,one or more vectors of probabilities output by the machine learningmodule 155. If the machine learning model(s) were trained using thefeature vectors that represent the properties of the satellitecommunication system, the analysis and recommendation module 157 canprovide, as the input, FVNH(s) used by the machine learning module 155to determine the potential causes and their respective probabilities.

The analysis and recommendation module 157 can then provide data 158identifying the action(s) and/or the most likely cause(s) to an actionmodule 161. The analysis and recommendation module 157 can also providedata 159 identifying the action(s) and/or the most likely cause(s) to auser interface module 163.

In stage (H), the user interface module 163 can generate and provide auser interface, e.g., to a device of the user 143 or another user, thatindicates the action(s) and/or the most likely cause(s). In someimplementations, the device indicates the action(s) and the most likelycause(s) to the user 143 by way of e-mail, text message, and/or a spokenlanguage interface.

The user interface module 163 can generate and update a dashboardinterface that presents a current condition of the satellitecommunication system, one or more of the most likely causes of thecondition, and/or actions that can be performed to resolve thecondition, if appropriate. The user interface module 163 can alsogenerate alarms when appropriate. For example, if at least a thresholdnumber of sequential FVNHs are mapped to the same cause, the userinterface module 163 can generate an alarm to alert a user (e.g., anetwork operator) to the cause.

In stage (I), the action module 161 determines whether to perform anaction to alter the operation of the satellite communication network. Insome implementations, the action module 161 prompts the user 143 toselect from one or more recommended actions, e.g., the one or moreactions selected by the analysis and recommendation module 157. If theuser 143 selects an action, the action module 161 can cause a componentof the satellite communication system to perform the action.

In some implementations, the action module 161 performs one or moreactions automatically, e.g., without input from the user 143. Forexample, the action module 161 can cause a component of the satellitecommunication system to perform an action (e.g., a top rated actionselected by the analysis and recommendation module 157). The actionmodule 161 can then monitor the status and other information about thesubsystems and components of the satellite communication system todetermine whether the action was effective. If not, the action module161 can cause another action to be performed, e.g., a next highestranked action.

The action module 161 can determine to perform an action automaticallybased on the category or severity of the cause or condition. Forexample, the action module 161 can be configured (e.g., include a set ofrules) to perform the action automatically if the cause is categorizedin one of a set of pre-specified categories. In another example, theaction module 161 can be configured to perform the action automaticallyafter a duration of time passes (since the cause was first detected) andthe user 143 has not selected an action or performed an action. Thisproactive action can prevent a problem with the satellite communicationsystem from escalating.

In stage (J), the action module 161 provides data 165 to a component ofthe satellite system to initiate the action. In this example, theselected action is to switch the NAS device to another NAS device (e.g.,as part of a failover) as the cause of the condition is a slow NASdevice. In this example, the action module 161 can provide data 165 to anetwork management system that controls the NAS devices. The data cancause the network management system to switch the NAS device to anotherNAS device, e.g., a backup NAS device.

FIG. 2 is a diagram that illustrates an example of a system 200 forusing machine learning models to detect causes of conditions of asatellite communication system and includes similar elements as thesystem 100 of FIG. 1. The system 200 can be implemented in one or morecomputing systems. The system 200 includes a data module 210 thatincludes a data collection module 211 and a data processing module 212.The data collection module 211 can obtain network component information(e.g., the network component information 141) from the subsystems andcomponents of the satellite communication system. As described above,the network component information can include various information foreach of the subsystems, components, and their respective softwareinstances. The data collection module 211 can obtain the informationperiodically based on a specified time period, e.g., one minute, fiveminutes, one hour, or another appropriate time period.

The data collection module 211 can provide the raw data received fromthe component(s) of the satellite communication system to the dataprocessing module 212 and to a data storage device 220 for storage in adatabase 221. The data storage device 220 can be implemented as a NASdevice.

The data processing module 212 can prepare the raw data for input tomachine learning models. This preparation can include converting theinformation to an appropriate format, aggregating the information,and/or normalizing the information. As described above, some information(e.g., status and alarm data) can be in the form of text. This text datacan be converted to numerical data using a conversion function.

The data processing module 212 can aggregate a portion of theinformation based on component type, location in the satellitecommunication system, and/or the type of information. For example, asdescribed above, each piece of information for multiple instances of thesame type of component and within the same part of the network (e.g.,part of the same channel, beam, or gateway) can be aggregated usingaveraging, convex summation, of another appropriate aggregationtechnique. The data processing module 212 can also normalize theinformation and any aggregated information to a particular range, e.g.,from zero to one inclusive. Example techniques for converting,aggregating, and normalizing data are illustrated in FIG. 3 anddescribed below.

The data processing module 212 can provide the processed data to machinelearning pre-processing modules 230 and to the data storage device 220.The machine learning pre-processing modules 230 include a feature vectorgenerator module 232 and a feature vector pre-preprocessor 234. Thefeature vector generator module 232 can generate a feature vector (e.g.,a FVNH) using the processed data received from the data processingmodule 212. The FVNH can be an n-dimensional vector that includes eachprocessed value received from the data processing module. As describedabove, the FVNH represents the overall status or health of the satellitecommunication system at a particular point in time.

The feature vector generator module 232 provides the FVNH to the featurevector pre-processor 234 and to the data storage device 220 for storagein the database 221. Each FVNH can be stored in the database 221 withthe time corresponding to the data included in the FVNH (e.g., the timeat which the data was measured or obtained by the data collection module211).

The feature vector pre-processor 234 can perform dimensionalityreduction on the FVNH using one or more dimensionality reductiontechniques. In some implementations, the feature vector pre-processor234 reduces the dimensionality of the FVNH using a first dimensionalityreduction technique, e.g., using a feedforward neural network. Forexample, an m-stage feedforward neural network can reduce thedimensionality of the FVNH from n dimensions to m dimensions (e.g., frommore than 100 to between 5 to 10). The feature vector pre-processor 234can also further reduce the dimensionality of the resulting reduceddimension FVNH using a second dimensionality reduction technique, e.g.,PCA. For example, this can reduce the dimensionality from 5-10dimensions to three dimensions to make it easier for a human tovisualize the results.

The feature vector pre-processor 234 can provide the reduced dimensionFVNH(s) to an ensemble of machine learning models 240 and to the datastorage device 220 for storage in the database (e.g., with theircorresponding times). As described above, each machine learning model241-243 can be configured to receive, as input, one or more FVNHs andoutput one or more potential causes of a condition of the satellitecommunication system and, for each potential cause, a respectiveprobability that the potential cause is an actual cause of thecondition. Also, as described above, each machine learning model 241-243can be trained or configured differently from each other machinelearning model 241-243.

In some implementations, two versions of the FVNH are provided as inputto each machine learning model. For example, the first version can bethe reduced FVNH that was reduced using the first dimensionalityreduction technique, e.g., to 5-10 dimensions. The second version can bethe reduced FVNH that was reduced using the second dimensionalityreduction technique, e.g. to three dimensions. In this example, themachine learning output of both versions of the FVNH can be combined foreach machine learning model 241-243. For example, the probability foreach potential cause can be averaged for each machine learning model241-243 prior to the outputs of the machine learning models 241-243 arecombined. In a particular example, the machine learning model 241 canoutput a probability for a slow NAS of 0.1% A using the first version ofthe FVNH as the input. The machine learning model can also output aprobability for the slow NAS of 0.3% using the second version of theFVNH as the input. In this example, the machine learning output of themachine learning model 241 for the slow NAS would be 0.2% (i.e., theaverage of 0.1% and 0.3%).

The machine learning outputs of the machine learning models 241-243 areprovided to an analysis and recommendation module 250 and to the datastorage device 220 for storage in the database 221. The analysis andrecommendation module 250 can determine, from the machine learningoutputs, a most likely cause of the condition of the satellitecommunication system and select an action to alter the operation of thesatellite communication system. For example, as described above, theanalysis and recommendation module 250 can determine the most likelycause by combining the machine learning outputs from multiple machinelearning models 241-243 and select an action using a set of rules and/orone or more machine learning models.

The analysis and recommendation module 250 can provide data indicatingthe most likely cause and the selected action to an action module 260, auser interface module 270, and the data storage device 220 for storagein the database 221. As described above, the action module 260 can causethe selected action to be performed, e.g., automatically, based on thecategory of the cause, the severity of the condition, and/or based on aduration of time elapsing without user action. If the action module 260can initiate the action by providing data (e.g., an instruction) to anetwork management system 265 that performs the action or causes acomponent of the satellite communication system to perform the action.The user interface module 270 can provide, to a device of a user, dataindicating the most likely cause and the selected action, e.g., at agraphical user interface, by way of e-mail or text message, or using aspoken language interface.

FIG. 3 is a flow diagram that illustrates an example process 300 forgenerating an FVNH. The process 300 can be performed by the computersystem 140 of FIG. 1 to generate an FVNH that can be input to one ormore machine learning models and/or used to train one or more machinelearning models. The process 300 can be performed on an ongoing orperiodic basis to generate FVNHs that represent the status or health ofthe satellite communication system over time.

In step 302, the computer system 140 collects and stores data fromcomponents of a satellite communication system. The data can includeinformation about (e.g., properties of) the subsystems and components ofthe satellite communication system. For example, as described above, theinformation can include status data, metrics, statistics, alarm data,error data, and/or other appropriate information about the components.The data can be collected periodically based on a specified time period.Each set of data can be stored with a time stamp that indicates a timeat which the data was obtained.

In step 304, the computer system collects and stores context data forthe satellite communication system. The context data for a FVNH caninclude a condition of the satellite communication system, a time periodin which the condition occurred, a cause of the condition, one or moreactions taken (or not taken) to alter the operation of the satellitecommunication system (e.g., to correct a problem or issue), categoriesof the causes, conditions, and/or actions, and/or appropriate contextdata. The context data can be obtained from a user (e.g., a networkoperator or network expert). For example, the computer system can promptthe user to provide the data, e.g., on a periodic basis. The contextdata for a particular point in time can be stored with (or with areference to) the network component data collected for the particularpoint in time.

In step 306, the computer system 140 generates an FVNH using thecollected data. The computer system 140 can generate the FVNH usingconstituent steps 308-318.

In step 308, the computer system 140 converts the data to an appropriateformat. For example, the computer system 140 can convert anynon-numerical data to numerical data. In a particular example, thecomputer system 140 can convert status data that can be one of a set ofpredefined statuses to a number that represents the status using aconversion function that maps each status to a corresponding numericalvalue.

In step 310, the computer system 140 aggregates information based oncomponent type. For example, each component of a same type can have thesame types of status, statistical, and other data. A subsystem can alsoinclude multiple instances of the same component. In these situations,the same data for multiple instances of the same component can beaggregated (e.g., by averaging or convex sum).

For example, let C_(i)i∈[1 . . . n] represent the i^(th) component orsubsystem and s_(ij) ^((t)) j ∈ [1 . . . m] represent the j^(th) type ofstatus or statistics of an instantiation of the i^(th) component orsubsystem at time t. The computer system 140 can aggregate the status orstatistic s_(ij) ^((t)) from individual software instantiations of thesame type of component to derive S_(ij) ^((t)) using a function f. Forexample, the computer system 140 can perform the aggregation usingRelationship 1 below:S _(ij) ^((t)) =f _(i)(s _(ij) ₁ ^(*(t)) ,s _(ij) ₂ ^((t)) ,s _(ij) ₃^((t)) , . . . ,s _(ij) _(n) ^((t)))  Relationship 1:

In this example, S_(ij) ^((t)) represents an aggregate of the status orstatistic s_(ij) ^((t)). The function f is an aggregation function,e.g., an average or a convex summation. For example, S_(ij) ^((t)) canbe an average data transmission speed for forward channel subsystems fora particular beam, gateway, or the entire satellite communicationsystem. The FVNH can include each S_(ij) ^((t)) for each statistic orstatus of each type of component in the satellite communication system.For example, the FVNH for time t can be represented by Relationship 2below:FVNH^(t)=[S ₁₁ ^(t) ,S ₁₂ ^(t) , . . . S _(1j) ^(t) ,S ₂₁ ^(t) ,S ₂₂^(t) , . . . S _(2j) ^(t) , . . . S _(i1) ^(t) ,S _(i2) ^(t) , . . . S_(ij) ^(t)]  Relationship 2:

In some implementations, the computer system 140 aggregates for multiplehierarchical levels within the system. For example, the computer system140 can first aggregate at the service provider level by aggregating thesame status and statistic data across each type of component for eachservice provider. Next, the computer system 140 can aggregate at thebeam level by aggregating the same status and statistic data across allservice providers for each individual beam. Next, the computer system140 can aggregate at the gateway level by aggregating the same statusand statistic data across each beam of each individual gateway. Finally,the computer system 140 aggregates for the satellite communicationsystem by aggregating the same status and statistic data across allgateways of the satellite communication system.

In another example, the computer system 140 can aggregate data byidentifying, for a component of the satellite communication system,properties of multiple sub-components of the component at the timecorresponding to the feature vector. The computer system 140 candetermine a property that represents the multiple sub-components basedon the properties of the multiple sub-components (e.g., by averaging orotherwise combining the properties of the sub-components). The computersystem 140 can include, in the feature vector, the determined propertyas a property of the component.

In another example, the computer system 140 can identify, for aparticular satellite beam, multiple components of a same type (e.g.,multiple forward channel subsystems). The computer system 140 canidentify multiple properties of each of the multiple components. Thecomputer system 140 can determine, for each property, an aggregatedvalue that represents an aggregation (e.g., average) of the propertyacross each of the multiple components. The computer system 140 caninclude, in the feature vector, the aggregated value as a property ofthe satellite beam.

In step 312, the computer system 140 normalizes the values of the data.For example, the computer system 140 can normalize each piece of data tobe within a range of zero and one, inclusive, or another appropriaterange. The computer system 140 can normalize the aggregated values aswell as the non-aggregated values.

In step 314, the computer system 140 creates a multi-dimensional featurevector that includes, as feature values, each normalized value. Thefeature values represent the properties of the satellite communicationsystem. The feature vector is also referred to herein as the FVNH as thefeature values the status or health of the satellite communicationsystem at a particular point in time.

In step 316, the computer system 140 performs dimensionality reductionon the FVNH. As described above, the computer system 140 can reduce thedimensionality of the FVNH using one or more dimensionality techniques,such as feedforward neural networks and/or PCA. For example, thecomputer system 140 can use an m-stage feedforward neural network toreduce the number of dimensions of the FVNH from “n” to “m,” where “m”is less than “n” and “n” is the number of dimensions of the FVNH priorto dimensionality reduction. In some implementations, the computersystem 140 can further reduce the dimensionality of the FVNH using PCAor another appropriate dimensionality reduction technique.

In operation, the accuracy of using, as input to the machine learningmodels, each reduced version can be monitored. If the accuracy using thereduced FVNH that was reduced to a dimensionality of “m” is greater thanthe accuracy of using the further reduced FVNH, then the reduced FVNHsmay be used to detect the causes of conditions of the satellitecommunication system rather than the further reduced FVNHs.

In step 318, the reduced and/or further reduced FVNH are output. Forexample, the FVNH(s) can be provided to a machine learning trainingmodule for use in training machine learning models, as described belowwith reference to FIG. 3. The FVNH(s) can also be provided as input toone or more machine learning models, as described below with referenceto FIG. 4.

FIG. 4 is a flow diagram that illustrates an example process 400 fortraining machine learning models to detect causes of conditions of asatellite communication system. The process 400 can be performed by thecomputer system 140 of FIG. 1.

In step 402, feature vectors are labeled with time and context data. Thefeature vectors can be the FVNHs (e.g., reduced FVNHs) generated usingthe process 300 of FIG. 3. The time for each feature vector can be thetime at which the properties represented by the feature vectors weremeasured or detected. As described above, the context data for a FVNHcan include a condition of the satellite communication system, a timeperiod in which the condition occurred, a cause of the condition, one ormore actions taken (or not taken) to alter the operation of thesatellite communication system (e.g., to correct a problem or issue),categories of the causes, conditions, and/or actions, and/or appropriatecontext data. The computer system 140 can obtain the context data from auser (e.g., a network operator or network expert). For example, thecomputer system 140 can provide an interface that prompts the user toprovide the data, e.g., on a periodic basis. The context data for aparticular point in time can be stored with (or with a reference to) thefeature vector for that particular point in time.

In some implementations, the computer system 140 converts each cause(and other context data) into numeric form so that the cause can be usedin mathematical equations when training the machine learning model(s).The cause can be encoded using a technique referred to as “one hotencoding” which converts a single field (e.g., a cause of a condition)into multiple fields having binary values.

In step 404, the computer system 140 trains one or more machine learningmodels to detect (e.g., predict) the cause of a condition of a satellitecommunication system. The machine learning model(s) can be trained usingthe labelled training data (e.g., the labelled feature vectors). Asdescribed above, each of multiple machine learning models can be traineddifferently, e.g., using different training parameters. The differenttraining parameters can include different types of machine learningmodels, different configurations (e.g., different tuning parameters,different optimizers, different numbers of layers, etc.) of a same typeof machine learning model, different subsets of the labelled featurevectors, and/or other appropriate parameters. The machine learningmodel(s) can be configured to output one or more potential causes of thecondition and, for each cause, a respective probability that thepotential cause is the actual cause of the condition.

The machine learning models can also be configured to output aprobability that the satellite communication system is operatingnormally. For example, the machine learning models can be trained tooutput an indication that the system is operating normally based on thefeature vector(s) when the one or more machine learning models detectthat the condition of the satellite communication is normal based on theproperties of the satellite communication system.

In step 406, the computer system 140 generates rules and/or machinelearning models to select actions that resolve conditions of thesatellite communication system. As described above, a set of rules canspecify an action based on one or more of the causes having the highestprobabilities. The set of rules can be generated and maintained by anetwork operator, network expert, or another user. As described above,the machine learning models for selecting the action can be trained toselect an action based on the most likely cause and/or the featurevectors used to determine the potential causes and the probabilities ofthe potential causes.

FIG. 5 is a flow diagram that illustrates an example process 500 forusing machine learning models to detect causes of conditions of asatellite communication system and performing a selected action to alteroperation of the satellite communication system. The process 500 can beperformed by the computer system 140 of FIG. 1.

In step 502, the computer system 140 obtains feature vectors thatrepresent the current status of the satellite communication system. Thefeature vectors can be FVNHs (e.g., reduced FVNHs output by the process300) that include feature values that represent properties of asatellite communication system. The obtained FVNHs can represent thestatus or health of the satellite communication system and may beunlabeled, e.g., FVNHs for which the cause of the condition of thesatellite communication system is not known. The obtained FVNHs can befor a recent time period, e.g., for a time period from a current timeextending back to a start time.

In step 504, the computer system 140 provides at least one of thefeature vectors as input to the machine learning model(s) and receivesmachine learning output(s) from each machine learning model. Forexample, each machine learning model can output one or more potentialcauses of the condition of the satellite communication system and, foreach potential cause, a probability that the potential cause is theactual cause. The machine learning model(s) can determine the machinelearning outputs based on the properties of the satellite communicationsystem represented by the feature vector(s).

In step 506, the computer system 140 identifies the cause of thecondition of the satellite communication system indicated as being mostlikely based at least on the machine learning outputs. For example, ifmultiple machine learning models are used, the computer system 140 cancombine the outputs of the models. In a particular example, the computersystem 140 can determine, for each potential cause, a combined score(e.g., a combination of the probabilities) for the potential causeacross all of the machine learning models. For example, the combinedscore for a particular cause can be the average of the probability ofthat cause output by the machine learning models. The computer system140 can then identify, as the most likely cause, the potential causehaving the highest combined score (e.g., the highest average across themodels). If a single machine learning model is used, the computer system140 can identify, as the most likely cause, the potential cause havingthe highest probability output by the model.

As described above, the computer system 140 can identify the most likelycause based on machine learning outputs over a particular time period.In this example, the computer system 140 can identify, as the mostlikely cause, a potential cause that maintains the highest probabilityamongst the various potential causes for at least a threshold durationof time or a potential cause that has a probability that increases atleast a threshold amount over a period of time or increases to becomethe highest probability amongst the potential causes. In a particularexample, the computer system 140 can identify the most likely cause byidentifying, for each potential cause and based on the machine learningoutputs received over the particular time period, a sequence ofprobabilities that the potential cause is an actual cause of thecondition of the system over the particular time period. The computersystem 140 can select the particular cause based on the sequence ofprobabilities for each potential cause. For example, the computer system140 can select a cause in response to the probabilities for the causeincreasing over the sequence.

In another example, each machine learning model can be trained to outputan indication of potential causes (and/or their probabilities) of thecondition of the satellite communication system based on the propertiesof the satellite communication system during a particular time periodrepresented by multiple feature vectors. In this example, the input toeach machine learning model can be multiple feature vectors for theparticular time period. Each feature vector can include feature valuesthat represent properties of the satellite communication system atdifferent times within the time period than each other feature vector.For example, one feature vector can be for a first point in time and asecond feature vector can be for a second point in time different fromthe first point in time.

In step 508, the computer system 140 selects an action to alter theoperation of the satellite communication system based at least on themost likely cause. For example, as described above, a set of rules orone or more machine learning models can be used to select an actionbased at least on the most likely cause. The computer system 140 canaccess a set of rules that specify, for each potential cause, one ormore corresponding actions for altering the network operation of thesatellite communication system. The computer system 140 can select, asthe action to alter the network operation of the satellite communicationsystem, at least one or more actions that correspond to the most likelycause.

The computer system 140 can also select the action based on other causes(e.g., the top n causes), the feature vectors used to detect thecause(s), data specifying whether actions were successful at resolvingthe condition in the past, and/or other appropriate data. For example,the computer system 140 can provide data indicating the most likelycause and the feature vector(s) as input to one or more machine learningmodels. The one or more machine learning models can be trained toreceive a cause of a condition of the satellite communication system andat least one feature vector that includes feature values representingproperties of the satellite communication system, and output anindication of one or more actions to alter the network operation of thesystem based on the cause and the feature vector(s).

The computer system 140 can receive, from each of the one or moremachine learning models, one or more machine learning outputs thatindicate one or more actions to alter the network operation of thesystem (e.g., and optionally a ranking of the actions) based on thecause and the feature vector(s). The computer system 140 can select theaction to alter the network operation of the system based at least onthe machine learning outputs. The computer system 140 can also selectthe action based on data specifying results of each of the one or moreactions when each of the one or more actions were previously performedin response to a previous instance of conditions satellite communicationsystem conditions associated with the particular cause. For example, thecomputer system 140 can select the action that has the highest successpercentage indicating the percentage of time the action was successfulat resolving the condition. In some implementations, the machinelearning models are trained to output the indication of the one or moreactions based on results of previous actions performed in response toprevious conditions of the system and associated cause of the previousconditions.

In step 510, the computer system 140 provides an indication of the mostlikely cause and/or the selected action. For example, the computersystem 140 can generate and provide a user interface that presents themost likely cause (and optionally other causes, such as the top npotential causes) and/or the selected action (and optionally otherpotential actions, such as the top n actions that could resolve thecondition). In another example, the computer system 140 can provide theindication by way of e-mail, text message, or a spoken languageinterface.

In step 512, the selected action is performed. For example, the computersystem 140 can prompt a user (e.g., network operator) whether to performthe selected action (or one of the other actions). In another example,the computer system 140 can initiate the selected action automatically,e.g., if the action or cause is categorized in a set of categories forwhich actions can be performed automatically or if a threshold durationof time has elapsed since the cause was detected.

The computer system 140 can determine to perform the selected action(e.g., without receiving a command from a network operator or other userto perform the selected action) based on a duration of time betweenproviding the indication of the cause of the condition of the satellitecommunication system and receiving a user command to perform theselected action exceeding a threshold duration. In another example, thecomputer system 140 can determine to perform the selected action withoutreceiving a command from a user to perform the selected action based ona category of the cause (e.g., the computer system 140 can performactions for some categories of causes without user input while notperforming the action for other categories). In another example, thecomputer system 140 can determine to perform the selected action withoutreceiving a command from a user to perform the selected action based ona severity of the cause and/or a severity of the condition of thesatellite communication system (e.g., of the cause or condition is hasat least a threshold level of severity, the computer system 140 canperform the action automatically without user input).

After the selected action is performed, the computer system 140 candetermine whether the selected action resolved the condition (e.g.,corrected a problem with the system). For example, as described above,the computer system 140 can continuously process feature vectors todetermine a cause of a condition of the satellite communication system.If the output(s) of the machine learning models change (e.g., from thecause for which the action was selected to normal), the computer system140 can determine that the action resolved the condition. If theoutput(s) of the machine learning models remain the same (e.g., the mostlikely cause remains the same although the probabilities for each causemay vary) for at least a threshold duration of time, the computer system140 can determine that the action did not resolve the condition.

In step 514, the computer system 140 updates the machine learningmodel(s). For example, if the condition is resolved, the computer system140 can label the feature vectors used to detect the cause with theactual cause and other context data (e.g. the action that resolved thecondition). The labelled feature vectors can then be used, with otherlabelled feature vectors (e.g., the ones previously used to train themodels), to update the machine learning model(s). If the condition isnot resolved, the computer system 140 can label the feature vectors usedto detect the cause with data indicating that the selected action didnot resolve the condition.

The computer system 140 can also receive additional training data thatincludes a set of additional feature vectors, e.g., labelled featurevectors that have been labelled with context data. Each additionalfeature vector can include feature values that represent properties ofthe satellite communication system detected at a time corresponding tothe additional feature vector. The additional training data can alsoinclude labels for the additional feature vectors. The label for afeature vector can specify a cause of a condition of the satellitecommunication system at the time corresponding to the additional featurevector. The computer system 140 can train each machine learning modelusing the additional training data, e.g., in combination with thetraining data previously used to train the machine learning model(s).

In step 516, the computer system 140 (or another computer system) usesthe machine learning model(s) to detect causes of conditions of othersatellite communications systems. For example, the same machine learningmode(s) can be used to detect causes of conditions in similar satellitecommunication systems and satellite communication systems for which theamount of labelled data is not sufficient to train accurate models. Themachine learning models can then be updated based on feature vectorsused to detect causes of conditions of the other satellite communicationsystems and labels of causes determined to be the causes of theconditions.

The computer system 140 (or another computer system) can identifysimilar satellite communication systems using one or more criteria. Thecriteria can include types an arrangements of components in the systems,spectrum size, data traffic (e.g., application level traffic),geographic location, and/or other appropriate criteria. For example, iftwo satellite communication systems have similar sized (e.g., within athreshold spectrum amount) forward and/or return links, the computersystem 140 can determine that the two systems are similar.

In another example, if at least a portion of the application leveltraffic (e.g., at least 50% of the application level traffic) is of asame type, the computer system 140 can determine that the two systemsare similar. In a particular example, two systems that are primarilyused by small offices and home users may have a common type ofapplication level traffic and two systems that are primarily used forenterprise applications may have a common type of application leveltraffic. In this example, the computer system 140 can determine that thetwo systems primarily used by small offices and home users are similar.Similarly, the computing system 140 can determine that the two systemsprimarily used for enterprise applications are similar.

In yet another example, satellite communication systems in the samegeographic region (e.g., same country, continent, state, etc.) may havesimilar architecture or traffic patterns. In this example, the computersystem can determine that the systems are similar if they are located inthe same geographic region.

Embodiments of the invention and all of the functional operationsdescribed in this specification may be implemented in digital electroniccircuitry, or in computer software, firmware, or hardware, including thestructures disclosed in this specification and their structuralequivalents, or in combinations of one or more of them. Embodiments ofthe invention may be implemented, in part, as one or more computerprogram products, i.e., one or more modules of computer programinstructions encoded on a computer-readable medium for execution by, orto control the operation of, data processing apparatus. The computerreadable medium may be a non-transitory computer readable storagemedium, a machine-readable storage device, a machine-readable storagesubstrate, a memory device, a composition of matter effecting amachine-readable propagated signal, or a combination of one or more ofthem. The term “data processing apparatus” encompasses all apparatuses,devices, and machines for processing data, including by way of example aprogrammable processor, a computer, or multiple processors or computers.The apparatus may include, in addition to hardware, code that creates anexecution environment for the computer program in question, e.g., codethat constitutes processor firmware, a protocol stack, a databasemanagement system, an operating system, or a combination of one or moreof them.

A computer program (also known as a program, software, softwareapplication, script, or code) may be written in any form of programminglanguage, including compiled or interpreted languages, and it may bedeployed in any form, including as a stand-alone program or as a module,component, subroutine, or other unit suitable for use in a computingenvironment. A computer program does not necessarily correspond to afile in a file system. A program may be stored in a portion of a filethat holds other programs or data (e.g., one or more scripts stored in amarkup language document), in a single file dedicated to the program inquestion, or in multiple coordinated files (e.g., files that store oneor more modules, sub programs, or portions of code). A computer programmay be deployed to be executed on one computer or on multiple computersthat are located at one site or distributed across multiple sites andinterconnected by a communication network.

The processes and logic flows described in this specification may beperformed by one or more programmable processors executing one or morecomputer programs to perform functions by operating on input data andgenerating output. The processes and logic flows may also be performedby, and apparatus may also be implemented as, special purpose logiccircuitry, e.g., an FPGA (field programmable gate array) or an ASIC(application specific integrated circuit).

While this specification contains many specifics, these should not beconstrued as limitations on the scope of the invention or of what may beclaimed, but rather as descriptions of features specific to particularembodiments of the invention. Certain features that are described inthis specification in the context of separate embodiments may also beimplemented in combination in a single embodiment. Conversely, variousfeatures that are described in the context of a single embodiment mayalso be implemented in multiple embodiments separately or in anysuitable subcombination. Moreover, although features may be describedabove as acting in certain combinations and even initially claimed assuch, one or more features from a claimed combination may in some casesbe excised from the combination, and the claimed combination may bedirected to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particularorder, this should not be understood as requiring that such operationsbe performed in the particular order shown or in sequential order, orthat all illustrated operations be performed, to achieve desirableresults. In certain circumstances, multitasking and parallel processingmay be advantageous. Moreover, the separation of various systemcomponents in the embodiments described above should not be understoodas requiring such separation in all embodiments, and it should beunderstood that the described program components and systems maygenerally be integrated together in a single software product orpackaged into multiple software products.

Thus, particular embodiments of the invention have been described. Otherembodiments are within the scope of the following claims. For example,the actions recited in the claims may be performed in a different orderand still achieve desirable results.

What is claimed is:
 1. A method performed by one or more processors, themethod comprising: obtaining, by the one or more processors, featurevalues representing properties of a communication system, the propertiesincluding at least a measure of network traffic for one or more elementsof the communication system; providing, by the one or more processors,the feature values as input to one or more machine learning models,wherein each of the one or more machine learning models is configured toreceive feature values representing properties of the communicationsystem and has been trained, using example data sets for one or morecommunication systems, to output an indication of potential causes of acondition of the communication system based on the properties of thecommunication system indicated by received feature values; receiving, bythe one or more processors and from each of the one or more machinelearning models, one or more machine learning model outputs thatindicate one or more potential causes of a condition of thecommunication system based on the properties of the communication systemrepresented by the feature values; determining, by the one or moreprocessors and based on the one or more machine learning model outputsreceived from each of the one or more machine learning models, aparticular cause of the condition of the communication system; andproviding, to a device, an indication of the particular cause of thecondition of the communication system.
 2. The method of claim 1,wherein: the one or more machine learning models include multiplemachine learning models; the one or more machine learning model outputsinclude, for each potential cause of the condition of the satellitecommunication system, a probability that the potential cause is anactual cause of the condition of the satellite communication system; anddetermining the particular cause of the condition of the satellitecommunication system comprises determining the particular cause based onone or more combined scores generated by determining, for each of theone or more potential causes, a combination of the probabilities outputby the machine learning models for the potential cause.
 3. The method ofclaim 1, further comprising: selecting, by the one or more processors,an action to alter network operation of the satellite communicationsystem based at least on the particular cause; and causing the selectedaction to be performed.
 4. The method of claim 1, further comprisingselecting, by the one or more processors, an action to alter networkoperation of the satellite communication system based at least on theparticular cause, the selecting comprising: receiving, by the one ormore processors, one or more sets of feature values that indicate atleast one of (i) properties of the communication system or (ii) anindication of the particular cause of the condition of the communicationsystem; providing, by the one or more processors, the one or more setsof feature values as input to one or more additional machine learningmodels configured to generate, based on input feature values, scores forone or more candidate actions that alter operation of the communicationsystem; receiving, by the one or more processors, a score for each ofthe one or more candidate actions that alter the operation of thecommunication system, wherein each score was generated using the one ormore additional machine learning models based on the one or more sets offeature values; selecting, by the one or more processors, at least oneof the candidate actions that alter the operation of the communicationsystem based on the scores for the one or more candidate actions; andproviding, to the device, an indication of the candidate actions thatalter the operation of the communication system that were selected basedon the scores generated using the one or more machine learning models.5. The method of claim 4, wherein the one or more sets of feature valuescomprise data indicating results of previous actions indicated by theone or more second machine learning outputs when the previous actionswere performed in response to a previous instance of communicationsystem condition associated with the particular cause.
 6. The method ofclaim 4, wherein the one or more additional machine learning models aretrained to output the scores for one or more candidate actions thatalter operation of the communication system based on results of previousactions performed in response to previous conditions of thecommunication system and associated causes of the previous conditions.7. The method of claim 1, wherein the properties include one or moreproperties for each of a plurality of components of the communicationsystem.
 8. The method of claim 7, wherein the properties include networktraffic measured at one or more of the plurality of components of thecommunication system.
 9. A system, comprising: one or more processors;and one or more non-transitory computer-readable media storinginstructions that, when executed by the one or more processors, causethe one or more processors to perform operations comprising: obtainingfeature values representing properties of a communication system, theproperties including at least a measure of network traffic for one ormore elements of the communication system; providing the feature valuesas input to one or more machine learning models, wherein each of the oneor more machine learning models is configured to receive feature valuesrepresenting properties of the communication system and has beentrained, using example data sets for one or more communication systems,to output an indication of potential causes of a condition of thecommunication system based on the properties of the communication systemindicated by received feature values; receiving, from each of the one ormore machine learning models, one or more machine learning model outputsthat indicate one or more potential causes of a condition of thecommunication system based on the properties of the communication systemrepresented by the feature values; determining, based on the one or moremachine learning model outputs received from each of the one or moremachine learning models, a particular cause of the condition of thecommunication system; and providing, to a device, an indication of theparticular cause of the condition of the communication system.
 10. Thesystem of claim 9, wherein: the one or more machine learning modelsinclude multiple machine learning models; the one or more machinelearning model outputs include, for each potential cause of thecondition of the satellite communication system, a probability that thepotential cause is an actual cause of the condition of the satellitecommunication system; and determining the particular cause of thecondition of the satellite communication system comprises determiningthe particular cause based on one or more combined scores generated bydetermining, for each of the one or more potential causes, a combinationof the probabilities output by the machine learning models for thepotential cause.
 11. The system of claim 9, wherein the operationscomprise: selecting, by the one or more processors, an action to alternetwork operation of the satellite communication system based at leaston the particular cause; and causing the selected action to beperformed.
 12. The system of claim 9, wherein the operations compriseselecting an action to alter network operation of the satellitecommunication system based at least on the particular cause, theselecting comprising: receiving one or more sets of feature values thatindicate at least one of (i) properties of the communication system or(ii) an indication of the particular cause of the condition of thecommunication system; providing the one or more sets of feature valuesas input to one or more additional machine learning models configured togenerate, based on input feature values, scores for one or morecandidate actions that alter operation of the communication system;receiving a score for each of the one or more candidate actions thatalter the operation of the communication system, wherein each score wasgenerated using the one or more additional machine learning models basedon the one or more sets of feature values; selecting at least one of thecandidate actions that alter the operation of the communication systembased on the scores for the one or more candidate actions; andproviding, to the device, an indication of the candidate actions thatalter the operation of the communication system that were selected basedon the scores generated using the one or more machine learning models.13. The system of claim 12, wherein the one or more sets of featurevalues comprise data indicating results of previous actions indicated bythe one or more second machine learning outputs when the previousactions were performed in response to a previous instance ofcommunication system condition associated with the particular cause. 14.The system of claim 12, wherein the one or more additional machinelearning models are trained to output the scores for one or morecandidate actions that alter operation of the communication system basedon results of previous actions performed in response to previousconditions of the communication system and associated causes of theprevious conditions.
 15. The system of claim 9, wherein the propertiesinclude one or more properties for each of a plurality of components ofthe communication system.
 16. The system of claim 15, wherein theproperties include network traffic measured at one or more of theplurality of components of the communication system.
 17. One or morenon-transitory computer-readable media storing instructions that, whenexecuted by one or more processors, cause the one or more processors toperform operations comprising: obtaining feature values representingproperties of a communication system, the properties including at leasta measure of network traffic for one or more elements of thecommunication system; providing the feature values as input to one ormore machine learning models, wherein each of the one or more machinelearning models is configured to receive feature values representingproperties of the communication system and has been trained, usingexample data sets for one or more communication systems, to output anindication of potential causes of a condition of the communicationsystem based on the properties of the communication system indicated byreceived feature values; receiving, from each of the one or more machinelearning models, one or more machine learning model outputs thatindicate one or more potential causes of a condition of thecommunication system based on the properties of the communication systemrepresented by the feature values; determining, based on the one or moremachine learning model outputs received from each of the one or moremachine learning models, a particular cause of the condition of thecommunication system; and providing, to a device, an indication of theparticular cause of the condition of the communication system.
 18. Thenon-transitory computer-readable media of claim 17, wherein: the one ormore machine learning models include multiple machine learning models;the one or more machine learning model outputs include, for eachpotential cause of the condition of the satellite communication system,a probability that the potential cause is an actual cause of thecondition of the satellite communication system; and determining theparticular cause of the condition of the satellite communication systemcomprises determining the particular cause based on one or more combinedscores generated by determining, for each of the one or more potentialcauses, a combination of the probabilities output by the machinelearning models for the potential cause.
 19. The non-transitorycomputer-readable media of claim 17, wherein the operations comprise:selecting, by the one or more computers, an action to alter networkoperation of the satellite communication system based at least on theparticular cause; and causing the selected action to be performed. 20.The non-transitory computer-readable media of claim 17, wherein theoperations comprise selecting an action to alter network operation ofthe satellite communication system based at least on the particularcause, the selecting comprising: receiving one or more sets of featurevalues that indicate at least one of (i) properties of the communicationsystem or (ii) an indication of the particular cause of the condition ofthe communication system; providing the one or more sets of featurevalues as input to one or more additional machine learning modelsconfigured to generate, based on input feature values, scores for one ormore candidate actions that alter operation of the communication system;receiving a score for each of the one or more candidate actions thatalter the operation of the communication system, wherein each score wasgenerated using the one or more additional machine learning models basedon the one or more sets of feature values; selecting at least one of thecandidate actions that alter the operation of the communication systembased on the scores for the one or more candidate actions; andproviding, to the device, an indication of the candidate actions thatalter the operation of the communication system that were selected basedon the scores generated using the one or more machine learning models.